Skillatwill

Big Data Introduction

Big Data Introduction
What is big data and Why Big Data?
Four Vs of Big Data
Scaling problems with the existing system and how Hadoop resolved them
What are MapReduce and HDFS
Different Hadoop vendors in the industry

Unix

Unix Concepts
Introduction to Unix
Basic Unix Commands
How to write a shell script

HDFS & its Architecture

Distributed Computing – Name Node and Data Node concepts
HDFS Introduction and Architecture
What are blocks in HDFS and how they make Hadoop Fault Tolerant
What is Secondary Namenode
What is checkpointing in Hadoop 1.0
Difference between Hadoop 1.0 vs Hadoop 2.0
HDFS configuration file and how to change block size on cluster
Hadoop File System Commands
Assignment on HDFS

MapReduce and Its Architecture

Different phases of MapReduce and Execution Flow
What is Input Split in MR
Word Count problem In MR
Joining Problem In MR
How to develop and submit MR code on Hadoop Cluster
Assignment on MapReduce

Yarn

Why Yarn
Components of Yarn & Architecture
How Resource Master function
Node manager responsibilities
How Application masterwork
Different Schedulers in Yarn

Sqoop

What is Sqoop and why it is used
Import Data from RDBMS to HDFS
Full vs Incremental Data Import
Different File formats to Import Data
Various methods to Import Data
Performance Tuning
Sqoop Jobs
Automate Sqoop using Shell Script
Sqoop Export from Hadoop to RDBMS

Hive / Impala

Hive Introduction
Datatypes in Hive
Architecture of Hive
How to create database, table using different file formats
Different ways to load data in Hive Tables
Views in hive
External vs Internal Tables
Partitioning vs bucketing
Static vs Dynamic Partitioning
Joins In Hive
Map side joins in Hive
Analytical functions in Hive
Performance tuning
Hive shell vs Beeline Shell
Hive Executions Modes – MapReduce, Tez/Spark
What is Impala and how it is different from Hive
Assignment

Scala

Scala Basics
Variable, Strings and Numbers
Arrays, List, tuple, Map
For loop, if-else and Match
Functions and Objects/ Class
What is the case class in Scala
The Scala REPL
How to write & Run Scala Program in IDE
Assignment

Spark

Introduction to Spark
What are RDDs?
How to Create RDDs
Transformations in RDD
Actions in RDD
Lazy evaluation in Spark
Lineage Graph in RDD
What are paired RDDs and when are they used
What are data frames in spark and how they are different from RDDs
How to create Dataframes
How to load data from RDBMS into Hadoop using Spark
How to perform transformations using DataFrame API
What is broadcast join in Spark
Cache vs Persist in Spark
Performance tuning in spark
What are datasets in spark and how are they different from Dataframes API
Assignment on Spark

Batch	Start Date	End Date	Timings	Batch Type
Batch 1	04-01-2021	12-02-2021	Mon-Fri 10:00 AM-12:00 PM	Weekday
Batch 2	08-02-2021	19-03-2021	Mon-Fri 10:00 AM-12:00 PM	Weekday
Batch 3	22-03-2021	30-04-2021	Mon-Fri 10:00 AM-12:00 PM	Weekday

Big Data with Spark

Description

Specifications

Description

Specifications

Map location

Course Provider

SAW Freelance Trainer

Contact Info

No Comments

Please login to leave a review

Related Classes

₹28,000

Big Data Hadoop Spark

₹28,000

Big Data Python Spark

₹23,000

Big Data Hadoop

₹28,000

Big Data Scala Spark

₹23,000

Big Data Hadoop

₹25,000

Spark using AWS

₹25,000

Big Data with Spark

₹23,000

Big Data Hadoop

Course Provider

SAW Freelance Trainer

Contact Info

Map location

Shares

Other Courses by Institute

Frontend web development

Backend web development

SAP - Materials Management

SAP FICO Training

SAP HR Module

SAP Sales & Distribution - Advanced Training

Full stack web development

Full Stack Web Development with Java Spring Framework

Machine Learning

Artificial Intelligence - Machine learning program

Machine Learning

Spark using AWS

Microservices and Spring Boot

Linux

Full Stack Java Development

Practical Approach to Machine Learning

Salesforce Admin

Data Science with R, Python and Tableau

Data Science & Overview of Machine learning with R

Big Data with Spark

Data Science and Overview of Machine Learning with Python

DevOps Training

Lean Six Sigma Yellow Belt

SQL Server

SQL Server Integration Services

Power BI

Data-Science and ArtificiaI Intelligence

SAP FI